How popular is the President?
Experimenting with a Gaussian Process to model presidential popularity across time.
This notebook tests how a Gaussian Process model can perform in predicting French presidents' popularity since the term limits switched to 5 years.
PARTIES = {
"chirac2": "right",
"sarkozy": "right",
"hollande": "left",
"macron": "center",
}
def standardize(series):
"""Standardize a pandas series"""
return (series - series.mean()) / series.std()
all_presidents = pd.read_excel(
"../data/raw_popularity_presidents.xlsx", index_col=0, parse_dates=True
)
d = all_presidents.loc[all_presidents.index >= pd.to_datetime("2002-05-05")]
# convert to proportions
d[["approve_pr", "disapprove_pr"]] = d[["approve_pr", "disapprove_pr"]].copy() / 100
d = d.rename(columns={"approve_pr": "p_approve", "disapprove_pr": "p_disapprove"})
# raw monthly average to get fixed time intervals
# TODO: replace by poll aggregation
d = d.groupby("president").resample("M").mean().reset_index(level=0).sort_index()
d["party"] = d.president.replace(PARTIES)
ELECTION_FLAGS = (
(d.index.year == 2002) & (d.index.month == 5)
| (d.index.year == 2007) & (d.index.month == 5)
| (d.index.year == 2012) & (d.index.month == 5)
| (d.index.year == 2017) & (d.index.month == 5)
)
d["election_flag"] = 0
d.loc[ELECTION_FLAGS, "election_flag"] = 1
# convert to nbr of successes
d["N_approve"] = d.samplesize * d["p_approve"]
d["N_disapprove"] = d.samplesize * d["p_disapprove"]
d[["N_approve", "N_disapprove"]] = d[["N_approve", "N_disapprove"]].round().astype(int)
# compute total trials
d["N_total"] = d.N_approve + d.N_disapprove
d
def dates_to_idx(timelist):
"""Convert datetimes to numbers in reference to a given date. Useful for posterior predictions."""
reference_time = timelist[0]
t = (timelist - reference_time) / np.timedelta64(1, "M")
return np.asarray(t)
time = dates_to_idx(d.index)
time[:10]
This whole section about prior is outdated and needs to be revised
Priors are very important to fit GPs properly, so let's spend some time thinking about our priors for a more refined model of the popularity of the president. We start by the priors for the lengthscale and period parameters:
ls_trend: The lengthscale of the long term trend. It has a wide prior with most of the mass between 1 to 3 years.ls_med: This is the lengthscale for the short to medium long variations. This prior has most of its mass below 2 years.scale_mixture_rateof the Rational Quadratic kernel: It is equivalent to using a combination of Exponential Quadratic kernels with different length scales. As the scale mixture rate approaches infinity, the kernel becomes exactly equivalent to the Exponential Quadratic Kernel. We center the prior for this parameter around 3, since we’re expecting that there is some more variation than could be explained by an exponentiated quadratic kernel.periodof the semi-periodic component: We don't have a strong prior on this, so we'll center the period at one year, with the possibility for short-term seasonality as well as longer seasonality.ls_period: The smoothness of the semi-periodic component. It controls how “sinusoidal” the periodicity is. The plot of the data shows that seasonality is quite far from a sine wave, so we use a Gamma whose mode is at 2, and a relatively large variance.ls_period_decay: The periodic decay. The smaller this parameter, the faster the periodicity goes away. I suspect the seasonality of popularity goes away quite quickly, so let's put most of the prior mass between 6 months and 2 years.
If you're a bit lost, that's quite normal: parameters for GPs are not easily interpretable so it takes some time to develop intuition about them -- all the more so because the Gamma distribution is very flexible, so it can take a lot of different shapes. Here are good educational ressources to think about priors in the context of GPs:
- PyMC3's CO2 at Mauna Loa example notebook.
- PyMC3's notebook for mean and covariance functions.
- PyMC4's notebook for mean and covariance functions.
- Michael Betancourt's "Probabilistic Building Blocks" case-study.
x = np.linspace(0, 120, 500)
priors = [
(r"$\alpha$=5, $\beta$=2", pm.Gamma.dist(alpha=5, beta=2)),
(r"$\alpha$=2, $\beta$=0.5", pm.Gamma.dist(alpha=2, beta=0.5)),
(r"$\alpha$=9, $\beta$=1", pm.Gamma.dist(alpha=9, beta=1)),
(r"$\alpha$=20, $\beta$=1", pm.Gamma.dist(alpha=20, beta=1)),
]
fig = plt.figure(figsize=(12, 5))
for i, prior in enumerate(priors):
plt.plot(x, np.exp(prior[1].logp(x).eval()), label=prior[0])
plt.xlim((-1, 40))
plt.xlabel("Months")
plt.ylabel("Density")
plt.title("Lengthscale priors")
plt.legend();
Now we have to think about priors for our scale parameters: amplitude_trend (the scale of the long term trend), amplitude_med (the scale of the short to medium term component), and amplitude_per (the scale of the semi-periodic component). We don't have a lot of prior information about these parameters, so let's choose a weakly informative prior:
x = np.linspace(0, 12, 500)
priors = [
("Exponential", pm.Exponential.dist(1)),
("HalfNormal", pm.HalfNormal.dist(1)),
]
fig = plt.figure(figsize=(12, 5))
for i, prior in enumerate(priors):
plt.plot(x, np.exp(prior[1].logp(x).eval()), label=prior[0])
plt.xlabel("Months")
plt.ylabel("Density")
plt.title("Amplitude priors")
plt.legend();
And now we can use these priors to simulate samples from the whole GP prior and see if our choices make sense:
amplitude_trend = pm.HalfNormal.dist(1.0).random(1)
ls_trend = pm.Gamma.dist(alpha=5, beta=2).random(1)
cov_trend = amplitude_trend ** 2 * pm.gp.cov.Matern52(1, ls_trend)
prior_timepoints = np.linspace(0, 60, 200)[:, None]
K = cov_trend(prior_timepoints).eval()
gp_prior_samples = pm.MvNormal.dist(mu=np.zeros(K.shape[0]), cov=K).random(size=20_000)
_, (left, mid, right) = plt.subplots(
1, 3, figsize=(16, 5), constrained_layout=True, sharex=True, sharey=True
)
for ax, samples in zip((left, mid, right), (5, 10, 100)):
ax.plot(
prior_timepoints,
gp_prior_samples[:samples].T,
color="darkblue",
alpha=0.3,
)
ax.set_title("Samples from the GP prior")
ax.set_xlabel("Time in months")
ax.set_ylabel("Popularity evolution");
_, ax = plt.subplots(1, 1, figsize=(14, 5))
ax.plot(
prior_timepoints.flatten(), np.median(gp_prior_samples, axis=0), color="darkblue"
)
az.plot_hdi(
prior_timepoints.flatten(),
gp_prior_samples,
hdi_prob=0.2,
ax=ax,
color="darkblue",
fill_kwargs={"alpha": 0.4},
)
az.plot_hdi(
prior_timepoints.flatten(),
gp_prior_samples,
hdi_prob=0.4,
ax=ax,
color="darkblue",
fill_kwargs={"alpha": 0.3},
)
az.plot_hdi(
prior_timepoints.flatten(),
gp_prior_samples,
hdi_prob=0.6,
ax=ax,
color="darkblue",
fill_kwargs={"alpha": 0.2},
)
az.plot_hdi(
prior_timepoints.flatten(),
gp_prior_samples,
hdi_prob=0.8,
ax=ax,
color="darkblue",
fill_kwargs={"alpha": 0.1},
)
ax.set_title("Prior marginal quantiles from the GP")
ax.set_xlabel("Time in months")
ax.set_ylabel("Popularity evolution");
Finally, let's pick a prior for our baseline parameter, i.e the intercept of our Gaussian Process regression. In other words, this will be the mean function of our GP -- the value it reverts to when data start lacking. There we have quite a lot of information: 50% popularity is historically quite high for a French president (just take a look at the whole dataset we loaded at the beginning of the NB), so keeping the mean at zero is sub-optimal -- remember that baseline lives on the logit scale, so a prior centered at 0 means a prior centered at $logistic(0) = 0.5$ on the outcome space.
We can do better: based on our domain knowledge, we expect most president to have a baseline popularity between 20% and 50% -- in other words, French people rarely love their presidents but often really dislike them. $Normal(-0.7, 0.5)$ looks reasonable in that regard: it expects 95% of the probability mass to be between -1.7 and 0.3, i.e $logistic(-1.7) = 15\%$ and $logistic(0.3) = 57\%$, with a mean approval of $logistic(-0.7) = 33\%$:
baseline_prior_samples = pm.Normal.dist(-0.7, 0.5).random(size=20_000)
ax = az.plot_kde(
logistic(baseline_prior_samples),
label="baseline ~ $Normal(-0.7, 0.5)$",
)
ax.set_xlim((0, 1))
ax.set_xlabel("Baseline presidential popularity")
ax.set_ylabel("Density")
ax.set_title("Baseline prior");
honeymoon_prior_samples = pm.Normal.dist(-0.5, 0.3).random(size=20_000)
ax = az.plot_kde(
logistic(honeymoon_prior_samples),
label="honeymoon_effect ~ $Normal(-0.5, 0.3)$",
)
ax.set_xlim((0, 1))
ax.set_xlabel("Bonus in popularity due to honeymoon effect")
ax.set_ylabel("Density")
ax.set_title("Honeymoon effect prior")
ax.legend(fontsize=12);
unemp_effect_prior_samples = pm.Normal.dist(0.0, 0.2).random(size=20_000)
fake_unemp = np.linspace(-3, 3, 200)[None, :]
prior_approval = logistic(
baseline_prior_samples[:, None]
+ gp_prior_samples
+ unemp_effect_prior_samples[:, None] * fake_unemp
)
The unemployment data can be found here.
- Explain the forward-fill for unemployment and the difficulties with these data (uncertainty coded in model)
- Explain the over-dispersion likelihood (polling error)
Also, the model will care about the order of magnitude of the unemployment, not the intrisic values, so we take the log of the unemployment. Finally, we need to standardize the data (mean 0 and standard deviation 1) so that our sampler has a better time sampling. Let's set this up:
x_plot = np.linspace(0, 1, 100)
pbar = 0.5
theta = 10.0
plt.plot(
x_plot,
np.exp(pm.Beta.dist(pbar * theta, (1 - pbar) * theta).logp(x_plot).eval()),
label=f"Beta({pbar * theta}, {(1 - pbar) * theta})",
)
plt.xlabel("Probablity")
plt.ylabel("Density")
plt.legend();
COORDS = {"timesteps": d.index}
with pm.Model(coords=COORDS) as econ_latent_gp:
# intercept on logit scale
baseline = pm.Normal("baseline", -0.7, 0.5)
# honeymoon slope
honeymoon = pm.Normal("honeymoon", -0.5, 0.3)
# log unemployment slope
log_unemp_effect = pm.Normal("log_unemp_effect", 0.0, 0.2)
# long term trend
amplitude_trend = pm.HalfNormal("amplitude_trend", 1.0)
# ls_trend = pm.InverseGamma("ls_trend", alpha=6, beta=14)
ls_trend = pm.Gamma("ls_trend", alpha=5, beta=2)
cov_trend = amplitude_trend ** 2 * pm.gp.cov.Matern52(1, ls_trend)
# instantiate gp
gp = pm.gp.Latent(cov_func=cov_trend)
# evaluate GP at time points
f_time = gp.prior("f_time", X=time[:, None])
# data
election_flag = pm.Data("election_flag", d.election_flag.values, dims="timesteps")
stdz_log_unemployment = pm.Data(
"stdz_log_unemployment",
standardize(np.log(d.unemployment)).values,
dims="timesteps",
)
# unemployment data is uncertain
# sd = 0.1 says uncertainty on point expected btw 20% of data std 95% of time
u_diff = pm.Normal("u_diff", mu=0.0, sigma=0.1, dims="timesteps")
u_uncert = stdz_log_unemployment + u_diff
# overdispersion parameter
theta = pm.Exponential("theta_offset", 1.0) + 10.0
p = pm.Deterministic(
"p",
pm.math.invlogit(
baseline + f_time + honeymoon * election_flag + log_unemp_effect * u_uncert
),
dims="timesteps",
)
y = pm.BetaBinomial(
"y",
alpha=p * theta,
beta=(1.0 - p) * theta,
n=d.N_total,
observed=d.N_approve,
dims="timesteps",
)
with econ_latent_gp:
trace_econ = pm.sample(
draws=2000,
chains=8,
cores=8,
return_inferencedata=True,
idata_kwargs={
"dims": {"f_time": ["timesteps"], "f_time_rotated_": ["timesteps"]}
},
)
az.summary(
trace_econ, var_names=["~u_diff", "~p", "~f_time", "~f_time_rotated_"], round_to=2
)
az.plot_trace(
trace_econ, compact=True, var_names=["~u_diff", "~p", "~f_time", "~f_time_rotated_"]
);
az.plot_trace(
trace_econ,
var_names=["u_diff", "p", "f_time"],
compact=True,
coords={"timesteps": trace_econ.observed_data.timesteps[:110]},
);
az.plot_trace(
trace_econ,
var_names=["u_diff", "p", "f_time"],
compact=True,
coords={"timesteps": trace_econ.observed_data.timesteps[110:]},
);
az.plot_pair(
trace_econ,
var_names=["~u_diff", "~p", "~f_time", "~f_time_rotated_"],
divergences=True,
);
az.plot_rank(trace_econ, var_names=["~u_diff", "~p", "~f_time", "~f_time_rotated_"]);
az.plot_parallel(trace_econ, var_names=["~u_diff", "~p", "~f_time", "~f_time_rotated_"]);
MAX_OBSERVED = len(d.index)
TIME_BEFORE_ORIGIN = 0
MAX_TIME = MAX_OBSERVED + 3 # 1 quarter out-of-sample
RANGE_OOS = MAX_TIME + TIME_BEFORE_ORIGIN
tnew = np.linspace(-TIME_BEFORE_ORIGIN, MAX_TIME, RANGE_OOS)[:, None]
log_unemp = np.log(d.unemployment)
# unemployment stays around last value
ppc_unemp = np.random.normal(
loc=d.unemployment.iloc[-1],
scale=d.unemployment.std(),
size=(MAX_TIME - MAX_OBSERVED) // 3,
)
# data only observed quarterly, so need to forward-fill
ppc_unemp = np.repeat(ppc_unemp, repeats=3)
# log data and scale
stdz_log_ppc_unemp = (np.log(ppc_unemp) - log_unemp.mean()) / log_unemp.std()
# add noise around values
oos_unemp = stdz_log_ppc_unemp + np.random.normal(
loc=trace_econ.posterior["u_diff"].mean(),
scale=trace_econ.posterior["u_diff"].std(),
size=MAX_TIME - MAX_OBSERVED,
)
oos_unemp
ppc_unemp_10 = np.random.normal(
loc=10.0, scale=d.unemployment.std(), size=(MAX_TIME - MAX_OBSERVED) // 3
)
# data only observed quarterly, so need to forward-fill
ppc_unemp_10 = np.repeat(ppc_unemp_10, repeats=3)
# log data and scale
stdz_log_ppc_unemp_10 = (np.log(ppc_unemp_10) - log_unemp.mean()) / log_unemp.std()
# add noise around values
oos_unemp_10 = stdz_log_ppc_unemp_10 + np.random.normal(
loc=trace_econ.posterior["u_diff"].mean(),
scale=trace_econ.posterior["u_diff"].std(),
size=MAX_TIME - MAX_OBSERVED,
)
oos_unemp_10
ppc_unemp_5 = np.random.normal(
loc=5.0, scale=d.unemployment.std(), size=(MAX_TIME - MAX_OBSERVED) // 3
)
# data only observed quarterly, so need to forward-fill
ppc_unemp_5 = np.repeat(ppc_unemp_5, repeats=3)
# log data and scale
stdz_log_ppc_unemp_5 = (np.log(ppc_unemp_5) - log_unemp.mean()) / log_unemp.std()
# add noise around values
oos_unemp_5 = stdz_log_ppc_unemp_5 + np.random.normal(
loc=trace_econ.posterior["u_diff"].mean(),
scale=trace_econ.posterior["u_diff"].std(),
size=MAX_TIME - MAX_OBSERVED,
)
oos_unemp_5
pp_prop = logistic(
trace_econ.predictions["baseline"]
+ trace_econ.predictions["f_time_new"]
+ trace_econ.predictions["honeymoon"]
* trace_econ.predictions_constant_data["election_flag"]
+ trace_econ.predictions["log_unemp_effect"]
* trace_econ.predictions_constant_data["stdz_log_unemployment"]
)
pp_prop_10 = logistic(
trace_econ.predictions["baseline"]
+ trace_econ.predictions["f_time_new"]
+ trace_econ.predictions["honeymoon"]
* trace_econ.predictions_constant_data["election_flag"]
+ trace_econ.predictions["log_unemp_effect"]
* xr.DataArray(
np.concatenate((standardize(log_unemp).values, oos_unemp_10)),
dims=["timesteps"],
coords=PREDICTION_COORDS,
)
)
pp_prop_5 = logistic(
trace_econ.predictions["baseline"]
+ trace_econ.predictions["f_time_new"]
+ trace_econ.predictions["honeymoon"]
* trace_econ.predictions_constant_data["election_flag"]
+ trace_econ.predictions["log_unemp_effect"]
* xr.DataArray(
np.concatenate((standardize(log_unemp).values, oos_unemp_5)),
dims=["timesteps"],
coords=PREDICTION_COORDS,
)
)
raw_polls = all_presidents.loc[all_presidents.index >= pd.to_datetime("2002-05-05")]
# convert to proportions
raw_polls[["approve_pr", "disapprove_pr"]] = (
raw_polls[["approve_pr", "disapprove_pr"]].copy() / 100
)
raw_polls = raw_polls.rename(
columns={"approve_pr": "p_approve", "disapprove_pr": "p_disapprove"}
)